{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 读/写 excel\n",
    "\n",
    "欢迎大家回到 Machine Learning Diary~👏\n",
    "\n",
    "今天这节课主要带大家熟悉一下怎么使用python来操作Excel文件。基本上Excel的处理与DataFrame格式是分不开的，可以理解为DataFrame就是Excel在python里的一种表现形式。\n",
    "\n",
    "打开Jupyter，首先按照上节课教的自己生成一个DataFrame："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3\n",
       "0     1     4     7\n",
       "1     2     5     8\n",
       "2     3     6     9\n",
       "3     4     7    10"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "\n",
    "## 创建DataFrame ##\n",
    "df = pd.DataFrame({'col1':[1,2,3,4], 'col2':[4,5,6,7], 'col3':[7,8,9,10]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来尝试把这个dataframe保存成一个excel文件，首先要确定文件保存的路径。我们先来查看下当前路径："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "D:\\cwzy\\Girls-In-AI\\machine_learning_diary\\base\\read_write\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "path = os.path.dirname(os.path.abspath('__file__'))  # 查看当前文件的路径\n",
    "# path = os.getcwd()                                 # 查看路径的另一种方式\n",
    "print(path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面的这个\"D:\\cwzy\\Girls-In-AI\\machine_learning_diary\\base\\read_write\"就是我当前这个jupyter文件的当前路径。\n",
    "\n",
    "现在我要把df保存到相同的路径下面，使用to_excel()就可以，在括号内写上你想保存的文件名："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_excel('dataframe.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "运行完上一句之后，打开\"D:\\cwzy\\Girls-In-AI\\machine_learning_diary\\base\\read_write\"路径下查看，已经出现\"dataframe.xlsx\"，打开excel文件后里面的内容就是刚刚生成的表格数据。当然你的路径可能跟我的不一样，打开你对应的路径就可以了。\n",
    "\n",
    "如果你需要切换路径，先使用os.chdir来切换文件夹，比如你想把表格保存到桌面，运行下面代码："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.chdir(\"C:/Users/ZhangYi/Desktop\")  # 这里换上你的桌面路径\n",
    "df.to_excel('dataframe.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "运行之后，你到桌面就能看见刚刚生成的excel文件了。\n",
    "\n",
    "好了，现在你已经知道如何保存一个excel文件了。接下来尝试读取一个excel，很简单，使用pandas包的read_excel()函数就可以了，在括号内写上你想读取的excel路径名："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3\n",
       "0     1     4     7\n",
       "1     2     5     8\n",
       "2     3     6     9\n",
       "3     4     7    10"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2 = pd.read_excel('dataframe.xlsx')  # 读取刚才在桌面生成的excel文件\n",
    "df2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时，这个excel已经读进了python里面，与之前的df虽然长的一样，但已经是两个表格了。我们来尝试对这个df2做些改变："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "增加一列： col4，使数值为1\n",
      "   col1  col2  col3  col4\n",
      "0     1     4     7     1\n",
      "1     2     5     8     1\n",
      "2     3     6     9     1\n",
      "3     4     7    10     1 \n",
      "\n",
      "将第一列的第4个数改成0\n",
      "   col1  col2  col3  col4\n",
      "0     1     4     7     1\n",
      "1     2     5     8     1\n",
      "2     3     6     9     1\n",
      "3     0     7    10     1\n"
     ]
    }
   ],
   "source": [
    "print('增加一列： col4，使数值为1')\n",
    "df2['col4'] = 1\n",
    "print(df2,'\\n')\n",
    "print('将第一列的第4个数改成0')\n",
    "df2['col1'][3] = 0\n",
    "print(df2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以尝试把df2再保存到桌面，如果文件名不变的话，将覆盖掉df："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2.to_excel('dataframe.xlsx')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时你再到桌面打开\"dataframe.xlsx\"，里面的内容已经被修改了。\n",
    "\n",
    "除了excel，DataFrame还可以保存成.csv格式。CSV是一个非常常见也非常重要的表格格式，与\".xlsx\"差不多。我们来简单感受一下读写CSV。之后也是会经常与它见面的。\n",
    "\n",
    "\n",
    "# 读/写 csv\n",
    "\n",
    "将刚才的df2保存成csv格式，使用to_csv()，并且括号里文件的后缀也需要修改："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2.to_csv('df2.csv')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此时桌面已经有了df2.csv文件，是不是很简单。同样的，我们再来尝试读取这个csv文件："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Unnamed: 0</th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Unnamed: 0  col1  col2  col3  col4\n",
       "0           0     1     4     7     1\n",
       "1           1     2     5     8     1\n",
       "2           2     3     6     9     1\n",
       "3           3     0     7    10     1"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = pd.read_csv('df2.csv')\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我这里的用csv读取的df多增加了Unnamed: 0这一列。我使用list(df)来检查下列名："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['Unnamed: 0', 'col1', 'col2', 'col3', 'col4']"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(df3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "的确从4列变成了5列，是因为把index索引那一列也变成了真正的一列，因此我要把这没用的一列删掉："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col1</th>\n",
       "      <th>col2</th>\n",
       "      <th>col3</th>\n",
       "      <th>col4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   col1  col2  col3  col4\n",
       "0     1     4     7     1\n",
       "1     2     5     8     1\n",
       "2     3     6     9     1\n",
       "3     0     7    10     1"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3 = df3.drop('Unnamed: 0',1)\n",
    "df3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "大功告成！\n",
    "\n",
    "敲代码，有时候，就是这么简单！哈哈哈哈！在编程的世界里，办法总是比困难多~\n",
    "\n",
    "👩好啦，今天的课程就到这里啦！咱们下次见！~"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
